Minimizing Surprisal (Sum of POS surprisal and Word Surprisal)

Here I’m plotting data from the counterfactual languages optimized to minimize surprisal.

‘DistanceWeight’ is the distance parameter in the ordering model. Higher values indicate greater distance between head and dependent.

Distance logits for Det, Num, Adj

More negative values indicate stronger preference to be close to the noun. Adjectives are clearly predicted to be closer to the noun than determiners and numerals.

Distance logits for Det, Num, Adj, by word order

More negative values indicate stronger preference to be close to the noun.

In general, the model has a strong tendency to put Det, Num, Adj before the noun. Therefore, here, I include runs where I forced Det, Num, Adj to appear after the noun (leaving the rest unchanged). While the logits are shifted by a constant factor in the Head-Dependent (HD) case compared to Dependent-Head (DH), this is not meaningful in the context of my ordering model. Considering this, no differences are predicted between pre- and post-nominal modifiers (nothing like Universal 20 seems to come out).

##############

#lmCoreDependents = lmPerHead %>% filter(Head == "VERB") %>% filter(Count > 1000)
#lm_CoreDependents = lmCoreDependents %>% filter(Dependency %in% c("obj", "iobj", "nsubj", "nsubjpass"))
#
## distance
#lm_CoreDependents %>% group_by(Dependency) %>% summarise(DistanceWeight = mean(DistanceWeight))
#
## look at existing patterns
##lm_CoreDependents[order(lm_CoreDependents$Language, lm_CoreDependents$DH, lm_CoreDependents$DistanceWeight),]
#
#
#library(data.table)
#lm_CoreDependents = dcast(setDT(lm_CoreDependents), Language +FileName ~ Dependency, value.var=c("DistanceWeight", "DH", "DH_Weight"), na.rm=TRUE)
#
####################
#
## Left-Right Asymmetry
##summary(lmer(DistanceWeight ~ DH_Weight + (1|ModelName) + (1|FileName) + (1|Head) + (1|Dependency) + (1|Dependent), data=dataLM))
#
## entropy regularization masks this effect
##summary(lmer(DistanceWeight ~ DH_Weight*EntropyWeight + (1|FileName) + (1|Head) + (1|Dependency) + (1|Dependent), data=dataLM))
#
####################
#
## nominal vs pronominal core arguments
#
##lmPerHeadDep = as.data.frame(dataLM %>% group_by(Head,Dependency,  Language, FileName, AverageLoss) %>% summarise(DH_Weight = weighted.mean(DH_Weight, Count, na.rm=TRUE), DistanceWeight = weighted.mean(DistanceWeight, Count), DH = weighted.mean(DH, Count, na.rm=TRUE), Count=sum(Count)))
#
#
#lmNounPronArgs = dataLM %>% filter(Head == "VERB", Dependent %in% c("NOUN", "PRON")) %>%group_by(Dependency, Dependent, Language, FileName, AverageLoss)  %>% summarise(DH_Weight = weighted.mean(DH_Weight, Count, na.rm=TRUE), DistanceWeight = weighted.mean(DistanceWeight, Count), DH = weighted.mean(DH, Count, na.rm=TRUE), Count=sum(Count))
#
#lm_NounPronArgs = lmNounPronArgs %>% filter(Dependency %in% c("obj", "iobj", "nsubj", "nsubjpass"))
#
#library(data.table)
#lm_NounPronArgs = dcast(setDT(lm_NounPronArgs), Language +FileName ~ Dependency + Dependent, value.var=c("DistanceWeight", "DH_Weight"), na.rm=TRUE)
#
#
######### 
## 7: SOV (/+OSV) => Adv before Verb
#lmG7_1 = dataLM %>% filter(Head == "VERB", Dependent == "ADV", Dependency == "advmod") %>% select(Language, FileName, DH_Weight) %>% rename(DH_advmod = DH_Weight)
#lmG7_2 = dataLM %>% filter(Head == "VERB", Dependent == "NOUN", Dependency == "obj") %>% select(Language, FileName, DH_Weight) %>% rename(DH_obj = DH_Weight)
#lmG7 = merge(lmG7_1, lmG7_2, by=c("Language", "FileName"))
#cor.test(lmG7$DH_advmod, lmG7$DH_obj)
## there seems to be nothing here
#
#
## 13. "If the nominal object always precedes the verb, then verb forms subordinate to the main verb also precede it."
#
## 16. "In languages with dominant order VSO, an inflected auxiliary always precedes the main verb. In languages with dominant order SOV, an inflected auxiliary always follows the main verb."
#
#
## sounds like dependency length
#
## 17. "With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun." [but see Dryer 1992]
#
## sounds like dependency length
#
## 18. "When the descriptive adjective precedes the noun, the demonstrative and the numeral, with overwhelmingly more than chance frequency, do likewise."
#
#summary(lm_G_20 %>% filter(DH_Weight_amod > 0))
## seems potentially opposite to prediction
#
## 21. "If some or all adverbs follow the adjective they modify, then the language is one in which the qualifying adjective follows the noun and the verb precedes its nominal object as the dominant order."
#
#lmG21_1 = dataLM %>% filter(Head == "ADJ", Dependent == "ADV", Dependency == "advmod") %>% select(Language, FileName, DH_Weight) %>% rename(DH_advmod = DH_Weight)
#lmG21_2 = dataLM %>% filter(Head == "NOUN", Dependent == "ADJ", Dependency == "amod") %>% select(Language, FileName, DH_Weight)%>% rename(DH_amod = DH_Weight)
#lmG21_3 = dataLM %>% filter(Head == "VERB", Dependent == "NOUN", Dependency == "obj") %>% select(Language, FileName, DH_Weight)%>% rename(DH_obj = DH_Weight)
#lmG21 = merge(lmG21_1, lmG21_2, by=c("Language", "FileName"))
#lmG21 = merge(lmG21,   lmG21_3, by=c("Language", "FileName"))
#summary(lmG21[lmG21$DH_advmod < 0,])
## there seems to be nothing here (sounds more like dependency length)
#
##Head == "VERB", Dependent %in% c("NOUN", "PRON")) %>%group_by(Dependency, Dependent, Language, FileName, AverageLoss)  %>% summarise(DH_Weight = weighted.mean(DH_Weight, Count, na.rm=TRUE), DistanceWeight = weighted.mean(DistanceWeight, Count), DH = weighted.mean(DH, Count, na.rm=TRUE), Count=sum(Count))
#
#
#
## 23. "If in apposition the proper noun usually precedes the common noun, then the language is one in which the governing noun precedes its dependent genitive. With much better than chance frequency, if the common noun usually precedes the proper noun, the dependent genitive precedes its governing noun."
#
## 24. "If the relative expression precedes the noun either as the only construction or as an alternate construction, either the language is postpositional, or the adjective precedes the noun or both."
#
#
#
#
###############################
## 25. "If the pronominal object follows the verb, so does the nominal object."
#lmG25 = dataLM %>% filter(Head == "VERB", Dependent %in% c("NOUN", "PRON")) %>%group_by(Dependency, Dependent, Language, FileName, AverageLoss)  %>% summarise(DH_Weight = weighted.mean(DH_Weight, Count, na.rm=TRUE), DistanceWeight = weighted.mean(DistanceWeight, Count), DH = weighted.mean(DH, Count, na.rm=TRUE), Count=sum(Count))
#
#lmG25 = lmG25 %>% filter(Dependency %in% c("obj"))
#
#lmG25 = dcast(setDT(lmG25), Language +FileName ~ Dependency + Dependent, value.var=c("DH_Weight"), na.rm=TRUE)
#
#summary(lmG25 %>% filter(obj_PRON < 0))
## expect obj_NOUN to be negative, but there is nothing
#
#
#
#
#
## prepositions vs postpositions
#
#
#prePostpos = dataLM %>% filter(Head == "NOUN", Dependency == "case", Dependent == "ADP") %>%group_by(Dependency, Dependent, Language, FileName, AverageLoss)  %>% summarise(DH_Weight = weighted.mean(DH_Weight, Count, na.rm=TRUE), DistanceWeight = weighted.mean(DistanceWeight, Count), DH = weighted.mean(DH, Count, na.rm=TRUE), Count=sum(Count))
#
#
#library(data.table)
#prePostpos = dcast(setDT(prePostpos), Language +FileName ~ 1, value.var=c("DistanceWeight", "DH_Weight"), na.rm=TRUE)
#
#prePostposByLan = prePostpos %>% group_by(Language) %>% summarise(DH_Weight_SD = sd(DH_Weight, na.rm=TRUE), DH_Weight = mean(DH_Weight, na.rm=TRUE), DistanceWeight = mean(DistanceWeight, na.rm=TRUE))
## cor(prePostposByLan$DH_Weight, prePostposByLan$DistanceWeight)
## `when before the head, bind more strongly'

#################################

Distance logits for Core Arguments

More negative values indicate stronger preference to be close to the noun. Across languages, there is no significant difference between subjects and objects so far (contrary to what I wrote in the initial draft).

Distance logits for Nominal and Pronominal Arguments

More negative values indicate stronger preference to be close to the noun. Nominal arguments are predicted to come closer than nominal arguments.

Values for other Dependencies

Here I’m plotting averaged dependent-first (DH) and distance logits, averaged by head POS, dependent POS, and dependency label. While there are things that make sense, this data is a bit hard to directly interpret. I plot the By-Dependents version first, since it seems easiest to interpret.

Averaged Distance Predictions across Dependents

Here I’m plotting the inferred distance logits.

## Warning: Removed 1 rows containing missing values (geom_errorbar).

averaged over languages

Averaged Order Predictions across Dependents

## Warning: Removed 1 rows containing missing values (geom_errorbar).

Averaged Distance Predictions across Heads

averaged over languages

Averaged Order Predictions across Heads

Averaged Distance Predictions across Dependencies

## Warning: Removed 4 rows containing missing values (geom_errorbar).

## Warning: Removed 2 rows containing missing values (geom_errorbar).

## Warning: Removed 4 rows containing missing values (geom_errorbar).

## Warning: Removed 2 rows containing missing values (geom_errorbar).

## Warning: Removed 6 rows containing missing values (geom_errorbar).

## Warning: Removed 2 rows containing missing values (geom_errorbar).

## Warning: Removed 1 rows containing missing values (geom_errorbar).

## Warning: Removed 3 rows containing missing values (geom_errorbar).

## Warning: Removed 3 rows containing missing values (geom_errorbar).

## Warning: Removed 1 rows containing missing values (geom_errorbar).

## Warning: Removed 4 rows containing missing values (geom_errorbar).

## Warning: Removed 4 rows containing missing values (geom_errorbar).

## Warning: Removed 7 rows containing missing values (geom_errorbar).

## Warning: Removed 2 rows containing missing values (geom_errorbar).

averaged over Languages

Averaged Order Predictions across Dependencies

## Warning: Removed 2 rows containing missing values (geom_errorbar).

## Warning: Removed 4 rows containing missing values (geom_errorbar).

averaged over Languages